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Abstract: iV-subjettiness is a jet shape designed to identify boosted hadronic objects such 
as top quarks. Given subjet axes within a jet, A-subjettiness sums the angular dis- 
tances of jet constituents to their nearest subjet axis. Here, we generahze and improve on 
A^-subjettiness by minimizing over all possible subjet directions, using a new variant of the 
fc-means clustering algorithm. On boosted top benchmark samples from the BOOST2010 
workshop, we demonstrate that a simple cut on the 3-subjettiness to 2-subjettiness ratio 
yields 20% (50%) tagging efficiency for a 0.23% (4.1%) fake rate, making A^-subjettiness a 
highly effective boosted top tagger. A-subjettiness can be modified by adjusting an angular 
weighting exponent, and we find that the jet broadening measure is preferred for boosted top 
searches. We also explore multivariate techniques, and show that additional improvements 
are possible using a modified Fisher discriminant. Finally, we briefly mention how our mini- 
mization procedure can be extended to the entire event, allowing the event shape A^-jettiness 
to act as a fixed A cone jet algorithm. 
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1 Introduction 

With over one inverse femtobarn delivered by the Large Hadron Collider (LHC), the ATLAS 
and CMS experiments are truly exploring the high energy frontier of particle physics. Jets 
are an important probe for physics beyond the standard model, and both experiments have 
demonstrated a high level of sophistication in their study of jets. Using modern infrared- 
and collinear-safe jets algorithms [1, 2], the LHC experiments are searching for new physics 
in monojet production [3, 4], high-mass dijet resonances [5, 6], as well as multijet final states 
[7, 8], and these searches have an impressive reach for new phenomena. 

In addition, both experiments have started to use boosted hadronic objects as a probe 
of new physics in data [9-11] (see also Ref. [12, 13] for Tevatron measurements). When 
hadronically decaying resonances — such as top quarks, Higgs bosons, or W jZ bosons — are 
produced with a large enough Lorentz boost factor, they form a "fat jet" where the decay 
products are highly collimated. Jet mass is the most basic observable for distinguishing a 
boosted object from an ordinary quark- or gluon-initiated jet, but there has also been an 
explosion of interest in using jet substructure techniques to further distinguish, say, "top 
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Figure 1. Comparison of iV-subjettiness to otlicr boosted top taggers using benchmark samples 
from the BOOST2010 report [14]. These efficiency/ mist ag curves are taken from Ref. [14] and then 
overlayed with our results from Fig. 9 (for a one-dimensional T-ijTi cut) and Fig. 12 (for a multivariate 
Tat method). Details about these curves are given in Sec. 4, and we will use a different range for the 
vertical axis in subsequent figures to highlight the small mistag rate region. Except for the very high 
efficiency region, A^-subjettiness outperforms previous top tagging methods. 



jets" from "QCD jets". The experimental and theoretical progress in jet substructure has 
been summarized in a report following the BOOST2010 workshop [14], where the various 
tagging methods were roughly grouped as follows: algorithmic procedures to directly identify 
subjets within a fat jet [15-21]; jet shape techniques to measure the energy flow in a jet 
[22-25]; and grooming methods to improve jet mass resolution by reducing jet contamination 
[26-30]. There has also been work on template and matrix element methods [31, 32]. 

Recently, we introduced a new method to tag boosted hadronic objects using a jet shape 
called A^-subjettiness [33]. Denoted by tat and adapted from the event shape A^-jettiness 
[34], A^-subjettiness measures the degree to which radiation within a jet is aligned along N 
candidate subjet axes. As a jet shape, A^-subjettiness is interesting in its own right, since 
it is a calculable property of jets that generalizes the notion of jet angularities [22, 35, 36]. 
As a boosted object tagger, A^-subjettiness exhibits a number of advantages, combining the 
flexibility of jet shape techniques with the tagging performance of algorithmic procedures. 
As a proof of concept, we found in Ref. [33] that a simple one-dimensional cut on the ratio 
'T'ilT2 is particularly effective for identifying boosted hadronic tops. An alternative version 
of A^-subjettiness defined in the jet rest frame was introduced by Kim in Ref. [37] and ap- 
plied to boosted Higgs searches. Recently, A^-subjettiness has been applied to boosted ditau 
resonances [38] and technipions [39]. 

In this paper, we will show how the tagging performance of A^-subjettiness can be im- 
proved through minimization, focusing on the case of boosted tops. As originally defined in 
Ref. [33], A^-subjettiness required an external algorithm to determine the A^ candidate subjet 
axes within a jet, as it relied on axes from the exclusive clustering algorithm [40, 41] to 
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calculate r^r. Here, we will show how to find the subjet axes which minimize tj^j, using a 
variant of the so-called A;- means clustering algorithm [42] . This is analogous to how the event 
shape thrust [43] is defined, since thrust can be measured with respect to any axis, but what 
we call "thrust" is determined by the axis that minimizes thrust. 

Using the minimum tat, we will demonstrate the excellent tagging performance of A^- 
subjettiness using the boosted top benchmark samples prepared for the BOOST2010 report 
[14]. Analogous to jet angularities, A^-subjettiness can incorporate different angular weight- 
ing exponents, and we will find that the best tagging performance is achieved for the "jet 
broadening" measure [44] . Different minimization procedures are needed for different angular 
weighting exponents, and we will see that fc-means clustering minimizes the thrust measure, 
while a new algorithm is introduced for more general angular measures such as the jet broad- 
ening measure.^ The tagging performance of A^-subjettiness is summarized in Fig. 1, which 
demonstrates the excellent performance of both a one-dimensional cut on T2,/t2 as well as a 
modified Fisher discriminant based on A^-subjettiness and jet mass information. While we 
focus on boosted 3-prong tops in this paper, we expect the same minimization technique to 
improve T2/T1 for boosted 2-prong identification as well (i.e. W / Z or Higgs bosons). 

Finally, turning to the event as a whole, we will show that the same tm minimization 
procedure can be applied to the event shape A^-jettiness [34], allowing A^-jettiness to act like 
a fixed A^ cone jet algorithm. We will briefly comment on how such a procedure might be 
useful for boosted object searches. 

The remainder of this paper is organized as follows. In Sec. 2, we review the definition 
of A^-subjettiness and describe two generalizations. We then introduce the procedure to min- 
imize A^-subjettiness in Sec. 3.^ We study the top tagging performance of A^-subjettiness in 
Sec. 4, using the BOOST2010 benchmark samples. We briefly describe how our minimization 
procedure can be extended to convert A^-jettiness into a jet algorithm in Sec. 5, and conclude 
in Sec. 6. 

2 Generalizing N-subjettiness 

Boosted hadronic tops have a radiation pattern that is distinctly different from gluon- or 
quark-initiated jets, owing to the 3-prong nature of the top decay. A^-subjettiness exploits 
this difference in expected energy flow by "counting" the number of hard lobes of energy 
within a jet. Here, we will generalize the original definition of A^-subjettiness from Ref. [33] 
in two ways, first by including an angular weighting exponent and second by minimizing 
A^-subjettiness over all possible candidate subjet axes. 

Consider a fat jet reconstructed using some jet algorithm. A^-subjettiness is defined 
with respect to A^ candidate subjet axes, that is, A^ light-like directions hj within a jet that 
are chosen to align with the dominant radiation directions. We will use a tilde to indicate 

^After we developed our algorithm, we learned of Ref. [45] on i?l-fc-means clustering, which implements a 
similar procedure for the jet broadening measure alone. 

^The minimization algorithm is available at http://www.jthaler.net/jets/ as a plugin to Fast Jet [46, 47]. 
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A^-subjettiness measured with respect to generic subjet axes: 

f}f ^ = ^ Y.PT,^ min {(Ai?i,0^ (Ai?2,i)^ . . . , {^Rn,^?} • (2.1) 

Here, i runs over the constituent particles in a given jet, pT^i are their transverse momenta, 
and ARj^i = y^(A?/j^j)2 + (A(/)j^j)2 is the distance in the rapidity-azimuth plane between a 
candidate subjet J and a constituent particle i. Compared to Ref. [33], we have included 
an angular weighting exponent /?, and we will often drop the superscript for notational 
simplicity. The normalization factor do is taken as 

do = ^PxARof, (2.2) 

i 

where Rq is the characteristic jet radius used in the original jet clustering algorithm. 

The choice of subjet axes is crucial for defining A^-subjettiness, since Eq. (2.1) partitions 
the jet constituents into N so-called Voronoi regions centered on the subjet axes. In Ref. [33], 
the exclusive algorithm [40, 41] was used to find the directions hj. Here, we will focus on 
the axes which minimize tn, removing the tilde: 

r(f) = „ mm^ ff. (2.3) 

ni,n2,.--,"-JV 

In particular, tn is a function of the N light-like subjet axes hj, and r^v is the value of this 
function at its (global) minimum. This minimization over candidate subjet directions is not 
a trivial step and may at first seems computationally daunting, but in Sec. 3.1 we present an 
efficient algorithm to perform this task. Once the minimum is found, then the normalization 
factor in Eq. (2.2) ensures that < tat < 1. 

The angular weighting exponent f3 is analogous to the parameter a in angularities [35], 
with the correspondence a = 2 — f3. Collinear safety requires /? > 0. In Ref. [33], we found 
that /? = 1 (corresponding to the jet broadening measure [44]) was particularly effective for 
boosted object identification, and this finding will be confirmed in Sec. 4. Interestingly, the 
choice /3 = 1 is also preferred for discriminating light-quark jets from gluon jets [48]. As we 
will see in Sec. 3.1, /? = 2 (corresponding to the thrust measure [43]) is a special value from 
a minimization point of view. In addition, when we discuss jet finding in Sec. 5, /3 = 2 will 
correspond most closely to iterative cone algorithms. 

In Fig. 2, we demonstrate how A^-subjettiness works on a boosted top jet compared to 
a QCD jet with mass near mtop. Shown are the subjet axes and Voronoi regions determined 
by the minimum tn with /? = 1 and /? = 2, as well as tn using subjets from the exclusive 
kx algorithm. Note that the partitioning depends crucially on the choice of subjet axes. 
Also, unlike recursive clustering procedures like the kx [40, 41] or Cambridge- Aachen [49, 50] 
methods, the regions determined by minimizing tn are not directly correlated with the regions 
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Boosted top jet, t /t = 0.32 (lin, p = 1 ) 



Boosted top jet, x /i = 0.22 (quad, p = 2) 



Boosted top jet, i^/i^ = 0.28 (k^, p = 1) 
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Fat QCD jet, Tj/T^ = 0.63 (quad, p = 2) 
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Figure 2. Top row: Event displays for a typical top jet with invariant mass near mtop- In (a), the 
orange square, circles, and crosses indicate the axes that minimize fi, 72, and T3, respectively, for /3 = 1 
("linear" minimization). The dashed orange line indicates the edge of the two Voronoi regions for the 
axes minimizing fj^'', and the solid orange lines indicate the Voronoi edges for the axes minimizing 
fg^^ In (b), we show the same top jet with equivalent information for /? = 2 and the "quadratic 
minimization" in black, and in (c) for /3 = 1 and the axes found by the exclusive fc^ algorithm in gray. 
In this and subsequent event displays, the particles are clustered into virtual calorimeter cells of size 
0.1 by 0.1, and the marker area for each cell is proportional its scalar transverse momentum. Bottom 
row: similar diagrams for a fat QCD jet with mass near mtop- 



determined by minimizing ttv-i-^ The axes that minimize (3 = 1 tend to point in the direction 
of actual jet radiation (like a "median"), while the axes that minimize (3 = 2 tend to point in 
direction determined by the average subjet energy (like a "mean"). 

It is straightforward to see why tm quantifies how A^-subjetty a particular jet is, or in 
other words, to what degree it can be regarded as a jet composed of N subjets. Jets with 

'^In what follows, we will only compare the minimization axes to the axes found by the exclusive kx 
algorithm. Even though the Cambridge- Aachen algorithm also has an exclusive version which returns a fixed 
number of subjets, the nature of its clustering procedure allows far-away soft radiation to be clustered into 
the jet last, yielding anomalously large values for fjv. 



- 5 - 



Tat ~ have all their radiation aligned with the subjet directions hj and therefore have A'^ 
(or fewer) subjets. Jets with r^v have a large fraction of their energy distributed away 
from the subjet directions hj and therefore have at least + 1 subjets. Therefore, jets 
that are very 'W-sub jetty" should have a relatively large difference in their tat and r^v-i 
values. In practice, the purely geometrical, dimensionless ratio tn/tn-i is the best (simple) 
discriminant for A^-prong hadronic decays, a point we will further elaborate on in Sec. 4.2. 

3 Minimization Procedure 

A key ingredient in the definition of A^-subjettiness is an appropriate choice of candidate sub- 
jet directions fij. Ideally, one would determine tn by minimizing over all possible candidate 
subjet directions, analogously to how the event shape thrust is defined [43]. In that case, 
TN is a strictly decreasing function of A^ with < tn/tn-i < 1, since adding an additional 
subjet axis can always decrease the Voronoi distances. 

In Ref. [33], it was (erroneously) believed that a search for the global minimum of f^v 
would be too computationally intensive, which is why candidate subjet directions were de- 
termined using the exclusive kj- algorithm. While this was found to work reasonably well 
for boosted object tagging, it introduced residual algorithmic dependence and a certain sense 
of arbitrariness in the jet shape. Here, we present a fast minimization procedure to deter- 
mine the candidate subjet directions which minimize tn, using a generalization of A;- means 
clustering. 

3.1 Minimization Algorithm 

Minimizing the function f^'' in Eq. (2.1) is similar to the classic computer science problem 
of finding k clusters in a data set.^ For /? = 2, this is the /e-means clustering problem, which 
is to find the k cluster centers (or "means") that minimize the in-cluster variance (i.e. the 
weighted sum of the distances squared between data points and their nearest cluster center). 
One solution to this problem is Lloyd's algorithm [42], which terminates in polynomial time 
and produces k means which form a (local) minimum of the cluster variance. Combined with 
"sufficiently" many reseedings of the initial k cluster centers, Lloyd's algorithm can find the 
global minimum of the cluster variance. Below we generalize Lloyd's algorithm beyond /? = 2, 
to an algorithm capable of minimizing fjv for 1 < /3 < 3. 

Let us motivate an adaptation of Lloyd's algorithm, which aims to minimize A'"-subjettiness 
also for /3 7^ 2. For simplicity, suppose for a moment that we want to minimize 1-subjettiness 
for a particular cluster C by adjusting a single subjet axis {yo, 4>o): 



*The "fc" is standard notation in the computer science literature, while "N" is standard notation for jet 
counting in particle physics. We will use "fc" when referring to the fc-means algorithm, and "TV" to the jet 
shape A'^-subjettiness, but of course k = N throughout. 




(3.1) 
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Taking first-order partial derivatives of fi and setting them to zero gives: 

Off) _ 1 /3 



dyo do 2 

aff ) 1 (3 
d4>o do 2 



^prAvi - yo) [ivi - yof + {4>i 



13-2 
^21 V 



i&C 



[{yi - yo? + 



^-2 
^21 — 



0. 



(3.2) 



Any pair {yo, (po) which solves these two equations for a given distribution of particles corre- 
sponds to a local minimum of fi. For /3 = 2, the equations are easily solved by finding the 
weighted centroid of the cluster 



(yo,<Ao) 



T.iPT,iyi J2iPT,i 



EjPTJ ' EjPTj 



(3.3) 



and this observation forms the basis of Lloyd's algorithm. Interestingly, for recursive clus- 
tering algorithms, the (sub)jet axis is also aligned with the weighted centroid,^ so there is a 
relatively small difference between jet axes found with /cr-like algorithms and jet axes found 
by minimizing l-(sub)jettiness with /3 = 2. (See also Sec. 5.1 for a discussion of iterative cone 
algorithms.) 

For general /3, Eq. (3.2) does not have a closed form solution. However, there is a fast 
iterative algorithm to find a local minimum (yo, (po) to arbitrary precision. Suppose we already 



have a "guess" or initial seeding of the candidate subjet direction; call it {yll^\(j)^^] 
then define a recursive procedure {y^\4>l^^) — )■ {yo^^^\ 4>'o'^^^ ) as 



We can 



1 < /3 < 3 : 



(n+l) _ i&C 



E PTM 



y^-yir^Y + U^-<l>^^^'' 



fS-2 



E PT,j 



I3~2 



(3.4) 



and similarly for (j)^''^''' . It is straightforward to see that if {yo^'^^\ 4'o^~^'^') — \yo 
then we have found a local minimum. Furthermore, we argue in Sec. 3.3 that any cluster of 
particles has only one local minimum of fi for /3 > 1, which is thus the global minimum.^ 

The sequence {y^\(t>o^) does not generally yield an exact solution to Eq. (3.2) in finite 
time, but for 1 < /3 < 3 it will quickly asymptote to the desired value. This is demonstrated 
for a two-particle configuration in Fig. 3, where convergence is shown for 1 < /3 < 3. For 



^Strictly speaking, this is only true for recursive clustering algorithms that use the pt scheme [51] for 
defining the jet axis. Modern recursive clustering algorithms use the E scheme, which maintains information 
about the mass of a jet axis, so the rapidity distance is modified compared to using a light-like axis in the pt 
scheme. 

"For /? = 1 and certain fine-tuned particle configurations, it is possible to have a degenerate line of local 
minima. However, as f^^' is constant on this line, convergence of the recursive algorithm means that Tj^^ itself 
(the value of the global minimum) is still found. 
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2 particles with nearly equal (5 = 0.01 ) 




Step Number n + 1 



Figure 3. Convergence of the minimization algorithm for notable values of /3 on a one-dimensional 
two-particle configuration. One particle is located at y = with pT = e~*, the other at ?/ = f with 
PT ~ e+'^j and the global minimum of f^^' is located at yn = \{\ + tanh j^). The algorithm is 

initialized at j/g*^^ — 0.1, which is closer to the softer particle. Convergence to the global minimum 
of ff^ is reached for 1 < /3 < 3. The algorithm can converge to a non-global minimum for /3 < 1 if 
the initial axis is chosen too close to the softer particle (here shown by /3 = 0.9), and the algorithm 
diverges for /3 > 3 (here shown for the critical case /3 = 3). For /3 = 2, the algorithm finds the global 
minimum in one step, as expected from Lloyd's algorithm. 



Boosted lop jet, linear minimization Boosted top jet, linear minimization Boosted top jet, linear minimization 
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Figure 4. Convergence path of the minimization algorithm for = 3 and /3 = 1. Shown is the 
same top jet as in Fig. 2. Panels (a), (b) and (c) show three different initial seedings for our modified 
/c-means clustering procedure. The open circle is the seed position, the dots are the updated positions, 
and a line connecting them is drawn to guide the eye. The first two seeds find the correct global 
minimum in a small number of steps, while the third seed gets trapped at a local minimum. 
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/? > 3, the recursive procedure in Eq. (3.4) tends to diverge, though this behavior can be 
remedied if a dampening factor is included in the recursion. 7 For ^ < 1, ^ has many local 
minima, and finding the global minimum becomes computationally impractical. In the range 
1 < /3 < 3, we can efficiently find the global minimum of fi to arbitrary precision. 

The generalization from 1-subjettiness to A^-subjettiness is then straightforward. Prom 
Eq. (2.1), we see that tn partitions the jet constituents into Voronoi subclusters and sums 
the fi values for each of the subclusters. Since the procedure in Eq. (3.4) asymptotes to 
the (global) minimum of fi for any given subcluster, we can iteratively determine a (local) 
minimum of f^v by repeatedly applying Eq. (3.4) for each subcluster and then recalculating 
the Voronoi regions. As is the case for Lloyd's algorithm, this procedure is not guaranteed to 
find the global minimum for tn (except for A^ = 1 and 1 < /? < 3), though as we will discuss 
in Sec. 3.2, we have found that with sufficiently many starting seeds, the global minimum is 
obtained within the desired precision. 

Our algorithm for minimizing A^-subjettiness works essentially the same as Lloyd's algo- 
rithm with a modified assignment step and stopping criterion: 

• Initialization Step: Pick initial seed axes h^p = [v'qj, for J G {1, . . . , A^}, and 
set the iteration number n = 0. We will discuss the choice of seed axes in more detail 
in Sec. 3.2. 

• Assignment Step: Divide the jet into clusters Cj by assigning the particles to the 
closest subjet direction. In other words, i £ Cj if and only if AR{pi, h^J^^) < AR{pi, n"^) 
for all M / J. 

• Update Step: Update each cluster axis according to Eq. (3.4), yielding a new set of 
n J axes. 

• Iteration: Repeat the Assignment and Update Steps until the average directional 
change of the subjets 

(3-5) 

J=l 

is smaller than the desired precision threshold (A < 10~^ in this paper). 

In the computer science literature, a similar algorithm called i?l-A;-means was proposed in 
Ref. [45] for (3 = 1. The i?l-/E-means algorithm should not be confused with the /c- medians 
algorithm, as /c-medians is not rotationally symmetric. 

^That is, instead of using j/q"^^' directly, one uses a modified j/q"'*^"'^' = rfj/o"' + (f — d)yo"^^', with a 
dampening factor < rf < 1 (the undamped case is d = 0). With 1/2 < d < 1, the minimization algorithm 
does converge for all /3 > 1. 



- 9 - 



3.2 Infrared Safety and Seed Choices 

Given candidate sub jet axes (not necessarily the minimum axes), tn is an infrared- and 
colhnear-safe observable. Since Eq. (2.1) is linear in each of the constituent particle's trans- 
verse momenta, the addition of infinitesimally soft particles does not change A^-subjettiness 
(infrared safety). This linear pT dependence combined with smooth angular dependence 
{P ^ 0) ensures that the same tn value is obtained for collinear splittings (collinear safety). 

Crucially, the candidate subjet axes used in A^-subjettiness must be determined via a 
method that is also infrared- and collinear-safe. Certainly, the subjet axes which determine 
the global minimum of tm are infrared safe. But even if the algorithm in Sec. 3.1 can only 
find a local minimum, the minimization procedure is still infrared safe as long as the method 
to determine the seed axes is infrared safe.^ 

Of course, this still leaves an ambiguity as to the exact method for choosing the initial 
seed subjet axes. Note that randomly choosing initial subjet axes is a non-deterministic 
procedure and therefore gives an ambiguous definition of A^-subjettiness, since there is a 
chance the algorithm will converge to a non-global minimum. A deterministic (and infrared- 
and collinear-safe) option would be to use the output of a recursive subjet clustering algorithm 
(such as exclusive A;^^) as the seed axes, and only do one pass at minimization, though there 
is still no guarantee of converging to the global minimum. However, given the speed of the 
minimization algorithm in Sec. 3.1 and the fact that the definition of Eq. (2.1) is such that a 
jet typically has relatively few local minima, one can almost always find the global minimum 
of tn by brute force reinitialization with random seed axes. 

Throughout the paper, we use random initialization (repeated 100 times) and keep the 
axes that yield the smallest tat. More precisely, we first recluster the jet with the exclusive /ct 
algorithm into exactly A^ candidate subjets and add random noise, uniformly distributed in a 
0.8 X 0.8 square, to the rapidity-azimuth coordinates of these axes. We then use these shifted 
coordinates as 100 different sets of seed axes, and the outcome of the minimization algorithm 
which yields the lowest tn is identified as the global minimum. In Fig. 4, we show three 
typical minimization paths for /3 = 1 and different initial seeds. Even when the seed axes are 
quite far from the (local) minimum, only a small number of iterations are typically needed to 
achieve A < 10~^, and the majority of initializations converge to the global minimum.^ 

In Fig. 5, we show the difference between the minimum value (after 100 seeds) of A^- 
subjettiness compared to the value using the exclusive kx axes using the ti sample described 

*In particular, the addition of infinitesimally soft particles does not affect the minimization procedure as 
they cannot create extra local minima. Also, since the number of cluster regions is fixed at A'^, soft radiation 
certainly cannot change the number of subjets. 

®We note that our minimization algorithm becomes less sensitive to the initial seeding for higher values 
of /3. In that respect, (3 = 1, which we use extensively throughout this paper, is not optimal though still 
manageable for our purposes, as only 100 seeds are sufficient to obtain fjv values within ^ 1% of the values 
found after thousands of seeds. In other applications, one may find it useful to use /3 ~ 1.1, which in many 
respects (including tagging performance) has similar behavior to /? = 1 but decreased seed sensitivity and 
faster convergence (see Fig. 3). That said, the positions of the minimum axes can be very different for /3 ~ 1.1 
compared to /3 = 1. 
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Figure 5. Difference between the minimum value of r^r and the exclusive k-r fjy. The event sample is 
the 500-600 GeV tt sample detailed in Sec. 4.1, with the same event selection as Fig. 7. The top row 
is /3 = 1, the bottom row is /3 = 2, and the columns are t[^\ T2^\ and t^^'' ■ For /3 = 1, the difference 
between the minimum tat and the exclusive kx tn can be of order 50%, though this difference is 
ameliorated by doing a single pass of the minimization procedure using the exclusive kr axes as a 
seed. For /3 = 2, the values of A^-subjettiness are typically different by less than 10%, except for rare 
cases where the exclusive fc^ axes are near a local minimum of tn , such that even doing a single pass 
of the minimization procedure docs not help much. 



in Sec. 4.1. We also show the (local) minimum value obtained by doing a single pass at 
minimization starting from the exclusive kx axes (without added noise). We see that for 
/3 = 2, the minimum axes and the axes are quite similar, as expected from the discussion 
below Eq. (3.3). For /3 = 1, there can be a 50% shift between the minimum tn and the 
exclusive kx f^, though this difference is quickly diminished by one pass of the minimization 
procedure. 

3.3 Uniqueness of 1-subjettiness Minima 

The fact that a relatively small number of seeds are needed to find the global minimum of 
A'^-subjettiness is due in part to an interesting property of 1-subjettiness, which is that fi has 
a unique minimum for /3 > 1 (i.e. the global minimum). One can think of finding for 
N = k > 2 and /3 > 1 as being separated into two tasks: first partitioning the jet constituents 
into N subclusters Cj which together yield the lowest sum of subcluster fj'^^ values, and 
then finding the unique minimum of each Of course, the algorithm in Sec. 3.1 tackles 

partitioning and minimization at the same time. 
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To see why 1-subjettiness has a unique minimum, note that Eq. (3.1) is a sum of con- 
tinuous "potential" functions of (yoi'/'o)) one for each particle. For /3 > 1, these potential 
functions are strictly convex since they behave like Ai?^ . The sum of strictly convex functions 
is also strictly convex, and strictly convex functions (and thus fi) have a unique minimum. 
For the special case /3 = 1, fi is convex but not strictly so, which means that the minimum 
value of fi is unique, but this minimum could be obtained at multiple positions {yo,4>o)- 

When we talk about A^-jettiness as a jet algorithm in Sec. 5, we will encounter the 
potential function for iterative cone algorithms in Eq. (5.1). That potential function scales 
like min{Ai?^, itig}, where i?o is a fixed parameter, and thus the potential function is not 
convex. This leads to a proliferation of local minima for iterative cone algorithms, and the 
related problems of infrared safety in seeded cone algorithms. 

4 Top Tagging Performance 

In this section, we investigate the tagging efficiencies for top jets and the mistag rates for 
QCD jets using A^-subjettiness. Compared to the preliminary study in Ref. [33], we will 
use the top tagging benchmark samples from the BOOST2010 report [14]. This will enable 
an apples-to-apples comparison to common top tagging methods in the literature. Note, 
however, that the BOOST2010 samples are particle level samples and do not include realistic 
detector resolutions, efficiencies, or acceptances. 

4.1 Analysis Overview 

For our tagging performance study, we use benchmark samples from the BOOST2010 report 
[14]. These event samples are publicly available at: 

• http : / /www . Ipthe . jussieu . f r/~salain/pro j ects/boost2010-events/ 

• http : //tev4 .phys .Washington. edu/TeraScale/boost2010/ 

We will utilize samples from two different benchmark Monte Carlo programs which simulate 
proton-proton collisions at a center-of-mass energy of 7 TeV. The primary benchmark is 
HERWIG 6.510 [54] with a description of the underlying event from JIMMY [55] using an ATLAS 
tune [56]. We will also do one comparison study to PYTHIA 6 . 4 [52] with a p^-ordered shower 
using the PerugiaO tune [57]. 

The signal sample is hadronically-decaying tt production, and the background sample is 
QCD dijet production. They are divided in subsamples of equal size, with parton px ranges 
from 200-300 GeV, 300-400 GeV, . . . , 700-800 GeV, as shown in Fig. 6(a). Together they 
yield an approximately flat jet transverse momentum distribution in a kinematic regime that 
is interesting for new physics searches at the LHC.^^ 

^°In Ref. [33], A''-subjettiness was compared to simplistic implementations of the Johns Hopkins Top Tagger 
[19] and the ATLAS YSplitter method [17] on event samples from the default tune of Pythia 8.135 [52, 53]. 

^^This flat distribution in transverse momentum is of course artificial, since physical cross-sections fall off 
with pt, but it is helpful for testing the performance of tagging methods across a wide kinematic range. 
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Figure 6. Basic kinematics of the tt and dijet BOOST2010 samples, after clustering with R = 1.0 
anti-fcy jets, (a) Jet transverse momentum in the combined sample with parton-level pt between 200 
and 800 GcV. Note that this is an unphysical pt distribution, but serves as a useful testing ground 
for the various top tagging methods, (b) Jet invariant mass in the 500 GeV < pr < 600 GeV sample. 
An A^-subjettiness cut T3/T2 < 0.6 eliminates the bulk of the QCD jets as well as top jets with a mass 
much smaller than Wtop, but leaves most of the top resonance peak intact. 



In accordance with the BOOST2010 report, jets are defined with the anti-/cT algorithm 
[1] with a jet radius parameter of i? = 1.0 using Fast Jet 2.4.4 [46, 47]. No simulation 
of detector effects is performed, but only final state particles with pseudorapidity |ry| < 5.0 
(except neutrinos and muons) are considered in the jet clustering. Only the two hardest 
jets with Pt > 200 GeV are considered from each event, and efficiencies and fake rates are 
determined on a per jet basis. 

The various top tagging algorithms studied in the BOOST2010 report are summarized 
in Ref. [14] and described in more detail in the original papers. The five algorithms shown 
in Fig. 1 are referred to as "Hopkins" [19], "CMS" [58-60], "Pruning" [27, 28], "ATLAS" 
[17, 61, 62], and "Thaler /Wang" [18]. After the BOOST2010 report, two other top tagging 
methods were applied to this sample [24, 25], though the comparisons in Fig. 1 and later in 
Table 2 only include the originally tested algorithms. 

The basic criterion for tagging a boosted top quark is that the jet mass should fall near 
iT^top — 171 GeV. In Fig. 6(b), we show the jet invariant mass distribution from the 500-600 
GeV sub-sample, where one can clearly see the top resonance. One can also see that an 
A^'-subjettiness cut of t!^^^ /t2^^ < 0.6 substantially decreases the background in the top peak 
region without adversely affecting the signal much. For concreteness, we will consider the 
mass window 160 GeV < mjct < 240 GeV for top jets in Sec. 4.2. The upper limit of this 
mass range is relatively high compared to the lower limit, because boosted top jets often 
acquire additional mass from the underlying event. We consider possible optimizations of 
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the mass window in Sec. 4.3. In addition to the invariant mass cut, we will apply a cut on 
the ratio T3/T2, where the cut is adjusted to change the relative signal tagging efficiency and 
background mistag rate. 

4.2 N-subjettiness Performance 

We ffist show the effect of the minimization procedure on the raw A^-subjettiness distributions 
in the BOOST2010 sample. Plots of ri, r2, and T3 comparing top jets and QCD jets are shown 
in Fig. 7, after imposing the 160 GeV < mjct < 240 GeV criterion. As expected, the average 
value of TN is smaller using the minimum axes compared to the exclusive kx axes, though 
the shift is not as pronounced for /3 = 2, as expected from the discussion below Eq. (3.3). 

As argued in Ref. [33], is by itself not a very good discriminant for identifying boosted 
top quarks. While one might naively expect that an event with small T3 would be more likely 
to be a top jet, a quark- or gluon-initiated jet can also have small t^, as shown in Fig. 7(e). 
Though top jets are likely to have large ri and T2, QCD jets with a diffuse spray of large-angle 
radiation can also have large ri and T2, as shown in Figs. 7(a) and 7(c). However, those QCD 
jets with large T2 typically have large values of T3 as well, so it is in fact the ratio T3/T2 which 
is the preferred discriminating variable. 

Plots of the tn/tn-1 ratios are shown in Fig. 8 for /3 = 1 and (3 = 2. Notice that with 
the minimization procedure, we have tn/tn-i < 1, as expected. By eye, there is better 
top/QCD separation using the minimized tn values, and T3/T2 with /3 = 1 appears to be the 
best single variable for discrimination. Both of these observations will be confirmed below. 
There is additional distinguishing power in T2/T1 and (to a smaller extent) raw A^-subjettiness 
values, which will be explored in Sec. 4.3. 

We now quantify the performance of T3/T2 as a boosted top tagger. In Fig. 9, we show the 
effect of varying a cut on ts/t2 on the signal efficiency and background mistag rate using the 
BOOST2010 samples. Figs. 9(a) and 9(b) show curves for the /? = 1 measure using different 
choices for the subjet axes, and the best performance is obtained for the axes that minimize 
Figs. 9(c) and 9(d) show the effect of changing the angular weighting exponent /3, using 
the axes that minimize and the best performance is obtained for /3 = 1. Thus, the best 
tagging performance for T3/T2 is achieved by using the minimum axes with (3 = 1 (the jet 
broadening measure). When compared to other tagging methods in Fig. 1, A^-subjettiness 
shines as a boosted top tagger (at least on the BOOST2010 benchmark samples). 

In Table 1, we show the top tagging efficiency versus QCD mistag rate as a function of 
jet pt for different cuts on /t!^'^ . At low px (200-400 GeV), the efficiency for finding a jet 
within the top mass window is quite small, as an i? = 1.0 jet is unlikely to capture all of the 
top decay products. For higher pT ranges (400-800 GeV), the efficiency is remarkably stable 
as a function of the jet pT- The cut /t^^ < 0.6 yields approximately a 50% efficiency 
operating point, while T3^V'^2^^ < 0-4 yields approximately a 25% efficiency operating point. 

^■^Though not shown, it is indeed the case that all other axes/measure combinations perform worse than the 
minimum 13 = 1 axes with the 13 = 1 measure. 
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Figure 7. Left column: distributions of (a) ti, (c) T2 and (e) T3 for /? = 1 comparing boosted top and 
QCD jets. For these plots, we impose an invariant mass window of 160 GeV < mjot < 240 GcV on 
jets with i? = 1.0 and 500 GcV < pt < 600 GcV. The solid bold lines arc for the /3 = 1 minimization 
axes, while the dashed thin lines are for the exclusive fc^ axes. As expected, the minimization axes 
yield smaller values of tn than the exclusive kx axes. Right column: equivalent plots for (3 = 2. 
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Figure 8. Distributions of (a) /t[^\ (b) /t[^\ (c) /4^\ (d) rf '/-^f' for boosted top 
and QCD jets, using the same formatting and event selection as Fig. 7. Note that after applying the 
minimization procedure, all of these ratios are strictly less than 1. For boosted top identification, the 
best individual discriminating variable is (c) 4^"^ /t2^\ though especially (b) Tj^^'/t}^'' contains some 
additional information. 
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Figure 9. EfBcicncy/mistag curves for a variable cut on r3/T2 and a fixed top mass window 160 GeV < 
mjet < 240 GeV. Top row: Fixing /3 = 1, but calculating fjj^' using different axes: the axes that 
minimize the /3 = 1 measure ("lin"), the axes from exclusive fcr ("fcT")j ^"^^ sxbs that minimize the 
/? = 2 measure ("quad"). The left panel is for the 500 GeV < pr < 600 GeV sample while the right 
panel is for the entire pt range. We see that using f^' with the corresponding minimization axes 
("lin") gives the best performance. Bottom row: Changing /3, but always using the axes that minimize 



The jet broadening measure [j3 = 1) gives the best performance. 
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prp range (GeV) 


200-300 300-400 400-500 


No 4^' cut 


.069 : .0054 .34 : .034 .59 : .092 
.065 : .0049 .32 : .027 .58 : .073 
.049 : .0016 .26 : .0085 .47 : .022 
.026 : .0003 .15 : .0011 .25 : .0033 
.012 : .0000 .079 : .0003 .12 : .0006 



Px range (GeV) 


500-600 600-700 700-800 


No 4^^/4^^ cut 
r(^)/r«<0.3 


.71 : .14 .75 : .17 .75 : .19 
.70 : .11 .74 : .13 .74 : .14 
.55 : .028 .57 : .034 .57 : .035 
.28 : .0039 .27 : .0040 .27 : .0044 
.13 : .0005 .13 : .0011 .12 : .0008 



Table 1. EfBcicncics vs. mistag rates for top jets : QCD jets for each of the HERWIG parton px 
subsamples. The top row corresponds to just applying the mtop invariant mass window (160 GeV to 240 
GeV), and the subsequent rows include additional /t2^^ cuts. Once the top quarks have sufficient 
Pt for their decay products to be coUimated, 7V-subjettiness exhibits fairly uniform performance as a 
function of pt ■ 

Finally, since boosted top identification depends on the precise radiation pattern within 
a jet, it is subject to potentially large uncertainties from Monte Carlo modeling of the parton 
shower, hadronization, and underlying event. An indication of these uncertainties can be seen 
in Table 2, which compares the various top tagging algorithms on the HERWIG and PYTHIA 
samples. The relative performance between different algorithms is consistent between the 
programs, though the absolute fake rates in the PYTHIA sample are significantly smaller. 
We conclude that while the tagging performance of V-subjettiness does have Monte Carlo 
modeling dependence, it is no worse than for other proposed tagging methods. 

4.3 Multivariate Methods 

In the previous subsection, we saw that a simple cut on the ratio 4^^ /4^^ ^^'^ ^ fixed top 
mass window 160 GeV < mjet < 240 GeV yielded impressive top tagging performance. 
Here, we will explore whether multivariate classification might be able to further optimize 
tagging performance. One feature of V-subjettiness is that the entire set of values for 
various choices of N and /3 can be calculated on a jet-by-jet basis and then used as inputs to 
multivariate methods. In principle, A^-subjettiness could be used in tandem with some of 
the other top tagging methods, though we have not done a systematic study of that possibility. 

^■^In addition, rjv is a sum over sub-ri values for each Voronoi region. These sub-n values could be used as 
well, though we did not find any particular gain when including them in our multivariate studies. 
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HERWIG results 




eff. 


mistag 


eff. 


mistag 


Tagger 


(%) 


rate ( % ) 


(%) 


rate ( % ) 


Hopkins [19] 


20 


0.4 ± 0.02 


50 


4.9 ± 0.06 


CMS [58-60] 


20 


0.4 ± 0.02 


50 


5.2 ± 0.06 


Pruning [27, 28] 


20 


0.3 ± 0.02 


50 


7.6 ± 0.08 


ATLAS [17, 61, 62] 


20 


0.7 ± 0.02 


50 


4.6 ± 0.06 


T/W [18] 


20 


1.5 ± 0.04 


50 


6.0 ± 0.07 


^3/^-2 


20 


0.23 ± 0.01 


50 


4.12 ± 0.06 


Multivariate rjv 


20 


0.18 ± 0.01 


50 


2.96 ± 0.05 



PYTHIA results 




eff. 


mistag 


eff. 


mistag 


Tagger 


(%) 


rate ( % ) 


(%) 


rate ( % ) 


Hopkins 


20 


0.2 ± 0.01 


47 


3.2 ± 0.05 


CMS 


22 


0.3 ± 0.01 


49 


3.5 ± 0.05 


Pruning 


19 


0.2 ± 0.01 


49 


4.5 ± 0.06 


ATLAS 


18 


0.5 ± 0.02 


49 


3.1 ± 0.05 


T/W 


18 


0.8 ± 0.02 


57 


7.0 ± 0.08 




18 


0.14 ± 0.01 


49 


2.63 ± 0.05 


Multivariate 


18 


0.11 ± 0.01 


48 


1.84 ± 0.04 



Table 2. Summary of tagging efficiencies vs. mistag rates at different working points for a number 
of top taggers, including the Ty,/T2 cut in Sec. 4.2 and the multivariate tat method in Sec. 4.3. The 
performance numbers for the other taggers are taken from Ref. [14] and described in more detail there. 
The parameters are chosen such that all taggers run at 20% and 50% efficiency for the HERWIG samples, 
and the same parameters are then applied to the PYTHIA sample. Statistical errors on the mistag rate 
are indicated, and the efficiency numbers have uncertainties of 0.1%. There are systematic shifts in the 
tagging efficiencies and mistag rates between the two Monte Carlo programs, though the discrepancies 
for A^-subjettiness tagging are similar in magnitude as those for other tagging methods. 

To motivate the potential power of multivariate methods, consider Fig. 10. In Fig. 10(a), 
we show distributions on the vs. t^'^ plane, which demonstrate (as expected) that t'^^ /t!^^ 
is a better discriminant variable than ri^-' or T2^'* alone. We saw in Fig. 8(b) that the ratio 
T2 /t{ might also have discriminating power, but square cuts on that ratio did not yield 
much improvement over using t'^'' /t^'^ alone. However, in Fig. 10(b) we see that top jets 
and QCD jets are fairly well-separated in the r^p /t^^ vs. /t!^^ plane, and multivariate 
analyses can capitalize on these kinds of correlations. 

Though there are a variety of multivariate classification methods one could study, we 
will focus on a (modified) linear Fisher discriminant [63, 64], since it is a straightforward 
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Figure 10. Density plots in the (a) r2^'' vs. Tg^-* plane and (b) Tj^-'/r}^' vs. Tg^-'/rj^^ plane for boosted 
top and QCD jets. The selection criteria are the same as in Fig. 7. While a linear cut on the T3^V'''2^'' 



ratio is clearly an effective boosted top tagger, there is additional information in other 7V-subjettiness 
variables that can be used in a multivariate method. 



extension of the one-dimensional one-sided cut Tg < c. The Fisher discriminant uses 



multi-dimensional information to form a linear one-sided cut: 



L-X <c, 



(4.1) 



where X is a vector of jet characteristics, such as A^-subjettiness values and jet mass, while 
L encodes the linear weights of each of these attributes. In a geometric language, the goal 
of a Fisher discriminant is to find the set of parallel hyperplanes (each one encoded by their 
own c parameter) defined by one normal vector L which best separate the top jet signal from 
the QCD jet background in the space M^™(^) . 

The standard (but not necessarily optimal) way to choose L is to take the L ■ X distri- 
butions for two different classes, and calculate the L which maximizes the variance between 
the two classes compared to the variance within each class. We will instead use a modified 
Fisher discriminant [64], which is better suited for defining efficiency /rejection curves: 



L = (SqcD + 7^top) ^(/^QCD - A'top) 



(4.2) 



where S are the covariance matrices and fl are the mean vectors for the variables in X. The 
standard Fisher discriminant takes 7=1, but we found that lower values of 7 were preferred 
for tagging performance, especially if t^^"^ /rip is one of the variables in X. 

As an instructive example, we include the following A^-subjettiness variables and ratios 
to define a discriminant: 



Tagger 


20% Efficiency 


50% Efficiency 




160 GeV < mjet < 240 GeV 

n/T2 < C 


c = 0.384 


c = 0.680 


multivariate 


160 GeV < ?njet < 280 GeV 
L-X <c 

L = [2.30, -5.85, -1.89, 6.21, 7.25, -5.35, -0.86, 1.61, -14.07] 

X- ^(1) ^(1) 4'^ 4'^ -i" 4'^ fAm^ /Am^ 1 
^— 'l 1 '2 ,'3 ' (1) ' 1(11' IW' (2) ' I m /'+ ' I m /- 
^1 ^2 ^1 ^2 


c = 3.51 


c = 5.29 



Table 3. Optimized parameters for the A^-subjettiness taggers at different working points. These 
parameters are used for the results in Table 2. Both the t^/t2 method and the multivariate method 
make use of a linear one-sided cut. The parameters for the modified Fisher discriminant were obtained 
from Eq. (4.2) with 7 = 0.7. 



In addition, we include jet mass information in the variables 

(^) ^max|^^^i2i^^,ol, (^) ^min|"^^"-"^^°P,ol, (4.4) 
\ in J ^ [ mtop J \m J ^ [ mtop J 

where we have separated out jet mass values above and below the top mass in order to 

effectively adjust the top mass window. Note that if the goal is to obtain a function L • X 

which works for a variety of operating efficiencies, the Fisher discriminant is not always 

improved by adding additional variables. In particular, we found more uniform performance 

(2) 

across the whole efficiency /rejection curve by not including r]^ values nor jet pT- We also 
found that performance was improved by first applying a uniform cut on jet mass 160 GeV < 
mjet < 280 GeV.^^ 

Applying Eq. (4.2) with 7 = 0.7, we found the linear coefficients L listed in Table 3. A 
plot of L ■ X appears in Fig. 11 which shows excellent signal/background separation. With 
these parameters, a sliding cut on L ■ X defines the efficiency curves in Fig. 12. Compared 
to the simple t^^'^ /t2^^ cut in Sec. 4.2, we see about a 20% decrease in the mistag rate for 
fixed efficiency. Table 2 tests the performance of the Fisher discriminant between HERWIG and 
PYTHIA, and we see that the relative improvement in using a multivariate method is consistent 
between the programs. Finally, we compare the multivariate tat selection to other top taggers 
in Fig. 1, where we again see fantastic performance. 

^^Including jets with masses below 160 GeV threw off the determination of L and thus the tagging efficiency 
(even though we use jet mass information in L). We suspect that QCD jets at lower jet masses have a distinctly 
different covariance matrix, making it difficult to obtain high-purity signal separation. 
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Figure 11. Plot of the linear discriminant L ■ X for boosted top and QCD jets. The value of L is 
given in Table 3, and a top mass window 160 GeV < mjet < 280 GeV has already been applied. This 
linear discriminant has more separation power than the /t^'^ variable in Fig. 8(c). 
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Figure 12. Efficiency /mistag curves for the linear discriminant. Here, we compare a cut on /t^^ 
to a cut on L • X, and find roughly a 20% improvement in the mistag rate for fixed efficiency, though 
the simple /t^'^ cut performs better as very small mistag rate. These are the same curves that 
appear in Fig. 1, albeit with a different range on the vertical axis to highlight the small mistag rate 
region. 
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5 N-jettiness as a Jet Algorithm 



The minimization procedure in Sec. 3 for individual jets can be extended to the full event, 
thus minimizing (some version of) A^-jettiness [34]. The axes and regions determined by this 
procedure would then define a jet algorithm, with the important caveat that the number of 
jets is fixed at A^. Here, we briefly sketch how such an algorithm might be used at a hadron 
collider, leaving a more complete study to future work. 

5.1 Previous Literature 

The idea of using minimization to define jets is not new, and we will briefly mention some of 
the previous literature. Cluster optimization is a rich area of study in computer science, but 
has only been used in limited cases for jet physics. 

The most basic example is in e^e" collisions, where the axis which minimizes thrust is 
used to define a hemisphere jet algorithm. To our knowledge, the first use of the fc-means 
clustering algorithm as a jet finder was given in Ref. [65], where two all-hadronic channels 
with fixed jet multiplicity were studied, e^e~ — ti {N = 6) and e^e~ — )• W~^W~ {N = 4). 
Jet finding can be seen more generally as an optimization problem, with different optimization 
measures proposed in Refs. [66-72]. 

The most well-known application of minimization for jets is in iterative cone finding. 
Stable cones are cones for which the jet axis and the jet 3-momentum are aligned. The 
procedure outlined in Ref. [73] for finding stable cones (in the pT scheme) is equivalent to 
minimizing 

rf)(i?o) = J]pT,^min{Ai^2,.,/^2|^ 

i 

which the reader will recognize as (unnormalized) 1-subjettiness with the thrust measure 
(/3 = 2) and an additional i?o cutoff. The choice /? = 2 is important since only then does 
the minimization criterion for ti{Ro) enforce jet axis/momentum alignment (see Eq. (3.3)). 
Of course, stable cone finding by itself is not sufficient for defining a jet algorithm because 
cones generically overlap, so iterative cone finding is usually augmented with a split-merge 
procedure. 

Note that t{ (Rq) does not have a monotonically increasing first derivative in radial 
directions, so it generically has many local minima (see Sec. 3.3), leading to the famous prob- 
lems of infrared safety of seeded cone algorithms (see Refs. [2, 74] for a review). Interestingly, 
the anti-/cT algorithm [1] acts like an idealized cone algorithm when applied to only the hard- 

(2) 

est jet in an event, and it tends to do an excellent job of minimizing {Rq) without any 
seeding problems. 

5.2 Extension to N-jettiness 

It is now straightforward to extend Eq. (5.1) to A'^-jettiness, and thereby use minimization to 
define a fixed N jet cone algorithm for hadronic collisions. Including both a jet radius cutoff 
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Rq and a beam pseudorapidity cut rjo, one possible definition of A^-jettiness is: 

rr^(i2o,r?o) = E^'^.--{(-P^) '(-P^) ' (^) ' ' ' ' ' (^) ' 

(5.2) 

The first two entries in the minimum are "beam measures", the next entries are "jet 
measures", and the final entry defines unclustered momentum. The exponents 7 and /3 are 
angular weighting exponents for the beam measure and the jet measure, respectively, and the 
choice /3 = 2 and 7 = 00 is similar in spirit to traditional iterative cone finding with hard 
cutoffs on Rq and 770. 

While there are a number of different distance measures that could be used to define 
A^-jettiness, this one is well-suited for hadronic collisions, since it is boost invariant along 
the beam axis and yields circular cones in rapidity/azimuth. The quantity TAr(i?o,%) cor- 
responds roughly to unclustered px, so minimizing Tjv(i?o,r/o) is essentially maximizing the 
amount of radiation contained in N cones. Unlike iterative cone algorithms which require a 
split-merge procedure, minimizing A^-jettiness automatically splits overlapping cones at the 
Voronoi edges. Of course, one could define the jets entirely by the Voronoi regions by taking 
Rq to be very large. 

The minimization procedure for T]y{RQ,riQ) is nearly identical to Sec. 3.1 with one impor- 
tant change. At each stage of the iteration, the only particles which participate in the axes 
update step are those for which the jet measure is smallest. In this way, the beam measure 
and i?o affect which particles can be clustered into jets, but not the way in which they are 
clustered. As in Sec. 3.1, different values for (3 require different update steps. 

In preliminary studies, we find that the jet regions determined by A^-jettiness are very 
similar to the A^ hardest jets returned by the anti-ZcT algorithm. Fig. 13 shows an event 
display where the anti-fcr region for i? = 1.0 is closely aligned with the Voronoi regions defined 
by T2^'°°^ with Rq = 1.0 and r/o = 5.0. However, there is a crucial difference: for any process 
with well-separated jets, 2-jettiness yields two perfect cones by definition, whereas the anti- 
kT jet areas can be modified by the presence of a nearby third jet (even if only two jets are 
studied). For /3 = 1, the jet axis can move substantially, though the actual jet constituents 
are quite similar. 

We can quantify the difference between the anti-ZcT jets and the A^-jettiness jets using the 
BOOST2010 samples. As demonstrated in Fig. 14, the two hardest jets determined by anti-Zcp 
are closely aligned with the axes found by 2-jettiness minimization with (5 = 2 (Ai? < 0.02), 
and the px of the resulting jets are quite similar {\^pt\/pt ^ 0.05). However, there is a tail 
to the distribution where the anti-ZcT jets have smaller pT than the A^-jettiness jets, due to the 
presence of a nearby third jet. As expected, there is a much larger change in the jet direction 

^^It is also possible to further generalize the definition of A^-subjettiness (and the minimization algorithm) 
to include "fuzzy edges" through partial assignment of particles to clusters. Instead of using absolute Voronoi 
assignment, one could assign a particle to all clusters but with normalized weight factors that are negatively 
correlated with the distance to the respective cluster centers, similar to Ref. [70]. 

^®We in fact use the anti-fcr jets (plus noise) as the seed axes for the minimization procedure. 



- 24 - 



Event display comparing N-jettiness and anti-l<^ clustering 
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Figure 13. A boosted tt event display comparing the 2-jettiness minimization procedure for (3 = 1 
and /? = 2 to the anti-fc^ jet algorithm. All three methods use R = 1.0. 2-jettiness yields perfectly 
circular cones, while the two hardest anti-fcr jets can be modified by the presence of a third jet. The 
cluster of particles in the lower half of the figure is arranged into jets of pt ~ 233/231/231 GcV for 
the three respective methods (/? = 1//^ = 2/anti-A:T)- The anti-fcr and (3 = 2 axes are well-aligned for 
this jet, while the (3 = 1 axis is offset from the former two. The cluster of particles in the top half 
of the figure has jets of pT = 235/226 GeV for the (3 = 1/(3 = 2 cones and is split into two jets of 
Pt = 167 GeV (red) and pT = 103 GeV (yellow) with the anti-/cT algorithm. 

for /? = 1 (though the actual px of the jet is rather stable), and this difference may be useful 
for studying jet systematics. In particular, note that ApT between anti-fcy and (3 = 1 jets 
is roughly symmetric about zero. For identifying moderately boosted tops. Fig. 15 shows 
how the top decay products are more likely to be clustered into the same jet with 2-jettiness 
minimization compared to the anti-fc-r algorithm. 

5.3 Discussion 

There are a number of potential benefits with using A^-jettiness as a jet algorithm. First, as 
advocated in Ref. [34], A^-jettiness is a way to define exclusive A^-jet samples, and there is 
a growing interest in calculating (and resumming) A'-jettiness distributions [75-79]. Second, 
for inclusive A^-jet samples, minimizing tn simultaneously determines the jet regions and 
gives a quality measure for the jet reconstruction (namely r^r itself, corresponding roughly 
to unclustered px)- Third, unlike traditional iterative cone finding, A^-jettiness automatically 
incorporates a "split-merge step" into the cone finding. In particular, the stable cones found 
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Figure 14. Comparison of tlie two liardest jets found witli anti-fc^ to tlie jets found witli 2-jettiness 
minimization. The 500 GeV < pr < 600 GeV event samples are used without any cut on the jet mass. 
Shown is the Ai? difference in the jet axes compared to the fractional difference in pT (anti-fc^ minus 
A^-jettiness, divided by anti-fc^). Left: tm minimization with /3 = 1. The jet broadening measure does 
not require the jet axis to align with the momentum axis, but the resulting jets have comparable px- 
Right: Tjv minimization with /3 = 2. Both the thrust measure and the anti-fc^ algorithm enforce jet 
axis/momentum alignment, yielding small Ai? separation. There is, however, a tail region where a 
third jet is identified by anti-fcy, decreasing the pt of the second hardest jet. 
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Figure 15. Invariant mass of jets found with 2-jettiness minimization and the anti-Zcy algorithm for 
low to moderate parton pT- For 200 GeV < pr < 300 GeV in (a), the top mass peak is more than 
twice as prominent for 2-jettiness jets than for anti-fcx jets. For 300 GeV < pr < 400 GeV in (b), 
the beneficial effect on top mass reconstruction is less pronounced though still significant. For parton 
Pt greater than 400 GeV, where the decay products are more coUimated, the effect disappears. The 
2-jettiness jets were found with /? — 2 minimization with the two hardest anti-fc^ jets used as seeds 
(no noise added to the coordinates). 
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in A^-jettiness include "recoil" , meaning that (for (3 = 2) the jet axis and the jet 3-momentum 
are aligned even when cones collide, a behavior that is similar to a.nii-kx-^'^ Finally, the 
angular exponents (3 and 7 have no known analog in traditional jet algorithms, and adjusting 
their values may be useful to test the robustness of jet finding. The jet broadening measure 
(/3 = 1) is particularly intriguing given its power for boosted object tagging and its novelty 
relative to standard jet algorithms. 

There are also a number of challenges for using A^-jettiness as a jet algorithm. Prom 
an algorithmic point of view, A^-jettiness minimization yields well-defined jets as long as the 
global minimum for tn is found, but just like for the /c-means algorithm, finding the global 
minimum can be challenging. The iterative procedure in Sec. 3.1 often converges to a local 
minimum for poorly chosen seeds, especially with the Rq cut. For practical purposes, it may 
be necessary to use an infrared/collinear safe method (such as anti-fcr) to determine seed 
axes, and then be satisfied with finding a local minimum. Though not a major concern with 
modern computers, A^-jettiness minimization is significantly slower than recursive clustering, 
especially with a large number of particles. From a physics point of view, the fact that A^ is 
fixed means that this algorithm does not define non-overlapping A^-jet samples. For analyses 
where the number of jets is known ahead of time, this is not an issue, but for more general 
searches this may be a liability. 

In the context of boosted hadronic objects, the fact that the number of cones is fixed at A^ 
suggests a way to smoothly interpolate between traditional jet studies and jet substructure. 
For example, minimizing 6-jettiness on a boosted ti sample could in principle find all of the 
top constituents in both the boosted and non-boosted regimes. In practice, this procedure 
is complicated by initial state radiation (ISR), since when minimizing A^-jettiness, there is 
competition between splitting a fat jet into smaller jets and trying to minimize unclustered 
Pt by identifying ISR jets. Such competition could be alleviated by using (> A^)-jettiness, at 
the expense of complicating the analysis. 

6 Conclusions 

Jets are an important probe of short distance physics, as they offer a window to phenomena 
beyond the standard model. The goal of jet substructure techniques is to maximize the physics 
reach for jets, and these "fat jet" methods are helpful for exploring extreme kinematic regimes 
with boosted hadronic resonances. 

In this context, A^-subjettiness is a particularly interesting jet shape, since it directly 
measures the A^-prong nature of a jet. As originally defined in Ref. [33], A^-subjettiness 
required external input to determine the A^ candidate subjet directions, and therefore had 
residual algorithmic dependence. In this paper, we have shown how a modified version of the 
A;-means clustering algorithm can be generalized to minimize ttv, and the minimum value of 
A^-subjettiness is then a true jet shape. 

^'^Note that anti-fcT and A'^-jettiness have different ways of deahng with overlapping cones, with A'^-jettiness 
spUtting jets democratically in area while anti-^T preferentially allowing the harder jets to remain circular. 
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Using the BOOST2010 benchmarks, we have shown that A^-subjettiness is a successful 
boosted top tagger, vaUdating the prehminary study in Ref. [33]. The ratio T3/T2 is an 
effective discriminant between top jets and QCD jets, especially if one uses the jet broadening 
measure (/3 = 1) and our minimization technique. Additional discrimination power is possible 
using multivariate techniques, and a modified Fisher discriminant incorporating jet mass, tat, 
and tn/tn-1 is particularly promising. It would be interesting to study whether other top 
tagging methods could be improved with tn information, and to test the performance of tat 
minimization on boosted 2-prong objects like W, Z, or Higgs bosons. 

Finally, the procedure to minimize A^-subjettiness on a single jet can be used to minimize 
A'^-jettiness on an entire event. This allows A^-jettiness to define a fixed A^ cone jet algorithm. 
While there have been a few attempts in the past to define jets in terms of minimization or 
optimization, A^-jettiness has the benefit that it is closely related to well-understood iterative 
cone algorithms, but does not suffer from the ambiguities of split-merge procedures. The fact 
that A^-jettiness includes an adjustable angular weighting exponent may prove useful, as one 
could interpolate between standard /? = 2 weighting and more exotic /? = 1 weighting. As 
the LHC continues to explore new (and old) physics with jets, we are encouraged by this 
interesting connection between jet substructure observables and jet finding algorithms. 
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